- Complex data structures (matrices, lists and data frames)
- Functions in R
- Reading data from files
- vectors (string, number, integer, logic, factor)
- matrices and arrays
- lists
- data frames
2024-03-13
Questions, anybody?
Summary on the lilies exercise
Elements of a vector can be accessed not only using numbers (indices) or logical vectors. You can assign names to vectors:
person <- c("January", "Weiner", 134)
names(person) <- c("FirstName", "LastName", "Age")
person["FirstName"]
person["Age"]
samples <- c(1, 10, 23, 42, 13) samples_n <- length(samples)
Both c and length are functions. They take some arguments (often many of them) and return a single object: a vector, a matrix or something else.
You can always assign the result of the function to a variable.
Sometimes functions return NULL, which is R for “nothing”, but which is still something you can use or assign.
In R, there are two special values: TRUE and FALSE. They can be used to create logical vectors.
sel <- c(TRUE, TRUE, TRUE, TRUE, FALSE) sel !sel
Comparison operators (>, <, <=, >=, ==, !=) produce logical vectors:
samples <- c(1, 1, 2, 5, 7) samples > 2 which(samples == 7) which(samples != 1)
Logical vectors can be used to access elements:
persons <- c("Aphrodite", "Bacchus", "Circe", "Demeter", "Eurypides")
sel <- c(TRUE, TRUE, TRUE, TRUE, FALSE)
persons[sel]
# we can abbreviate the TRUE and FALSE to T and F (avoid)
greek <- persons[ c(T, F, T, T, T) ]
https://youtu.be/xmeZofFlp78
Back to the lilies - how do you change all values which are greater than 1 to 1?
Create a vector as follows:
samples <- c(1, 10, NA, 15)
NA stands for not available (e.g., missing data)
length(samples) return?mean(samples) return? Why is that?na.rm=TRUE for the mean() function. Look up help (?mean) to see how it can be used. What happens now?is.na() function return when applied to samples?NA values? Use is.na and whichNA?The reason we are showing how to create a function is to show you that it is simple, and also because it will help you understand what functions are.
#' Function name
#' Function description
some_name <- function(param1, param2=2) {
## code comment
# <your code goes in here>
}
Much like vectors, matrices can only hold one data type (e.g. only numeric or only character or only logical etc.).
m <- matrix(1:18, ncol=3, nrow=6) # compare with m <- matrix(1:18, ncol=3, nrow=6, byrow=TRUE) dim(m) ncol(m) nrow(m)
matrix[row, column]
So, for example:
m[1, ] # vector which is the first row m[, 2] # vector which is the first column m[3, 1] # first element of the third row
We can name rows and columns of a matrix and use the names to access the rows and columns:
colnames(m) <- letters[1:ncol(m)] rownames(m) <- LETTERS[1:nrow(m)] m["A", "b"] # one "cell" m["B", ] # one row m[ , "b"] # one column
Assume you have a 48 well-plate for a drug sensitivity analysis with viability scores.
matrix and runif. These reflect your viability scores.Before starting you experiment, you decided to leave out the border wells to avoid edge effects:
The rows are treated with inhibitor 1 with increasing concentrations (control, low, medium, high). Columns 2 to 4 are treated with inhibitor 2 with increasing concentrations (control, low, high) and column 5 to 7 are treated with inhibitor 3 (same concentrations as inhibitor 2).
list() functionperson <- list(name="Weiner",
Age=NA,
given="January")
To access an element of a list, you need to use double brackets [[
person[["name"]]
There is a shortcut:
person$name
If you use single brackets [, you will get a piece of the “clothesline”, that is, you will produce a smaller list.
person["name"] class(person)
Caveats:
[[, not [names()), but don’t have toData frames are a bit like matrices, but every column can store different type of data. In this, they are more like lists (which they in fact are).
names <- c("January", "Manuela", "Bill")
lastn <- c("Weiner", "Benary", "Gates")
age <- c(1001, NA, 65)
d <- data.frame(names=names, last_names=lastn, age=age)
class(d)
class(d[,1])
class(d[,3])
You can access the data frame elements much like the elements of a matrix.
However, since data frames are lists, the list operator ($) also works:
d$names # same as d[,1] or d[, "names"] d$lastn d$lastn[1]
However, note that when you select a row, you will get a data frame, not a vector. This is because each of the column can be of different type, and vectors can hold only one type of data.
Caveats:
stringsAsFactors=FALSEGory details: matrices are a basic data type. Data frames are a list.
Caveats:
tibbles are the data frames from tidyverse
Whatever you can do to a data frame, you can do to a tibble as well
read_* functions return a tibble
tibble do not have row names
If you select a single row in a data frame, you get a smaller data frame. If you select a single column, you get a vector.
In tibble, you always get a smaller tibble.
https://youtu.be/eWu7kvNBpyc
matrix and rnorm.as.data.frame for that.rep function for that.seq function for that.rep() is used to replicate vectors.
rep(c("A", "B"), 5)
# result:
# [1] "A" "B" "A" "B" "A" "B" "A" "B" "A" "B"
rep(c("A", "B"), each=5)
# result
# [1] "A" "A" "A" "A" "A" "B" "B" "B" "B" "B"
[ ] … accessing the element of a vector / matrix / list / data frame -> extraction operators
[[ ]] … accessing the element/items of a list
$ … accessing elements by name
( ) … used when calling a function to provide arguments
{} … indicating a block, eg when defining a function